feat(cosmos-perf): record server-reported request duration as backend latency by tvaron3 · Pull Request #4316 · Azure/azure-sdk-for-rust

tvaron3 · 2026-04-29T23:55:39Z

Summary

Enhances the Cosmos DB Rust perf runner with two measurement improvements:

1. Backend (Server-Reported) Latency

Parses the x-ms-request-duration-ms response header and tracks it alongside the existing client-observed wall-clock latency. This separates network transit time from server processing time in performance reports.

2. Cgroup CPU Utilization (`cgroup_cpu_percent`)

Adds a new metric that reads cgroupv2 cpu.stat and cpu.max to compute CPU utilization relative to the container's allocated quota. This matches what kubectl top pods reports and replaces the misleading system_cpu_percent (which reads /proc/stat and shows host-level CPU, appearing artificially low in containers).

Changes

stats.rs: New read_cgroup_cpu_percent() function with delta-based measurement, division-by-zero guards, and safe u128→u64 clamping in histogram recording
runner.rs: Wire cgroup_cpu_percent: Option<f32> through PerfResult (serialized to Cosmos DB → ADX)
sdk/cosmos/.cspell.json: Add cgroup terminology to ignore list

Testing

Verified on AKS pods in cosmos-perf-rg — cgroup CPU reports ~78% matching kubectl top
Data flows through Cosmos DB change feed → ADX → Grafana dashboard

… latency Reads x-ms-request-duration-ms response header on every Cosmos request in the perf binary and emits backend_{min,max,mean,p50,p90,p99}_ms per operation per reporting interval. Surfaces server-side processing time separately from the client-observed wall-clock latency so network plus client-queue overhead can be isolated downstream. Implementation: - New helper extract_backend_duration in operations/mod.rs parses the header value as milliseconds (f64) into a Duration. - Operation::execute now returns Result<Option<Duration>> instead of Result<()>; each per-op implementation reads the header off the response (or sums across pages for QueryItems via into_pages()). - Stats gains a parallel HdrHistogram for backend durations; samples are independent of client samples (intervals where 0 backend durations were observed surface as None on Summary, which serializes as null and is skipped via skip_serializing_if). - PerfResult struct gains 6 Option<f64> backend_*_ms fields. Existing fields, behaviour, and JSON keys are unchanged. Old payloads without backend_* keys ingest cleanly into ADX (the schema mapping treats missing keys as null). Tests: - backend_durations_aggregate_separately_from_client verifies the two histograms are independent. - backend_summary_is_none_when_no_samples verifies the all-None path when the header is absent. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…ws' into feat/perf-backend-latency-v2 # Conflicts: # sdk/cosmos/azure_data_cosmos_perf/src/operations/create_item.rs # sdk/cosmos/azure_data_cosmos_perf/src/operations/upsert_item.rs

Renamed bmean -> backend_mean_dur, bmin -> back_min, bmax -> back_max to avoid cspell 'Unknown word' errors in CI Analyze step. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…ting Read cgroupv2 cpu.stat and cpu.max to compute pod-level CPU utilization that matches what kubectl top reports. Falls back to None when not running in a cgroup (e.g., local dev). Wire through PerfResult for ADX ingestion. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Add 'cgroupv' and 'usec' to the allowed words list to fix CI spell-check failures from the cgroup CPU metric addition. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Move 'cgroupv' and 'usec' from .vscode/cspell.json to the local sdk/cosmos/.cspell.json ignoreWords list. Reverts the root config. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot

Pull request overview

Adds backend (server-reported) latency measurement to the Cosmos perf runner by parsing x-ms-request-duration-ms, aggregating it alongside existing wall-clock latency, and emitting per-interval backend percentile/summary fields.

Changes:

Parse x-ms-request-duration-ms into an optional Duration and plumb it through Operation::execute.
Track backend-duration histograms separately from client wall-clock latency and emit backend summary stats (plus a “BackendP99” column in the console report).
Add cgroup CPU quota utilization metric reporting (cgroupv2) and update editor spellchecker word list.

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
sdk/cosmos/azure_data_cosmos_perf/src/operations/mod.rs	Adds `extract_backend_duration()` and changes `Operation::execute` to return `Option<Duration>`.
sdk/cosmos/azure_data_cosmos_perf/src/operations/create_item.rs	Returns backend duration extracted from response headers.
sdk/cosmos/azure_data_cosmos_perf/src/operations/read_item.rs	Returns backend duration extracted from response headers.
sdk/cosmos/azure_data_cosmos_perf/src/operations/upsert_item.rs	Returns backend duration extracted from response headers.
sdk/cosmos/azure_data_cosmos_perf/src/operations/query_items.rs	Iterates query by pages and sums backend duration across pages.
sdk/cosmos/azure_data_cosmos_perf/src/stats.rs	Adds backend histograms/summary fields and introduces cgroup CPU percent metric collection/printing.
sdk/cosmos/azure_data_cosmos_perf/src/runner.rs	Records backend durations into stats and serializes backend/cgroup metrics in result documents.
.vscode/cspell.json	Adds words related to the new cgroup metrics (and reformats the file).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

- Guard against u128→u64 truncation in histogram recording by clamping with .min(u64::MAX as u128) before cast - Add division-by-zero guard for period_usec==0 and cores<=0.0 in cgroup CPU calculation - Add 'cgroupv2' to sdk/cosmos/.cspell.json ignore list Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Restore original 2-space indent and sort order, keeping diff to just the 3 added words (cgroupv, cgroupv2, usec). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

FabianMeiswinkel

LGTM

github-actions Bot added the Cosmos The azure_cosmos crate label Apr 29, 2026

github-project-automation Bot added this to CosmosDB Go/Rust Crew Apr 29, 2026

github-project-automation Bot moved this to Todo in CosmosDB Go/Rust Crew Apr 29, 2026

tvaron3 and others added 4 commits April 29, 2026 17:10

Merge remote-tracking branch 'origin/release/azure_data_cosmos-previe…

c247abe

…ws' into feat/perf-backend-latency-v2 # Conflicts: # sdk/cosmos/azure_data_cosmos_perf/src/operations/create_item.rs # sdk/cosmos/azure_data_cosmos_perf/src/operations/upsert_item.rs

fix: rename variables to pass cspell spell check

14f76a4

Renamed bmean -> backend_mean_dur, bmin -> back_min, bmax -> back_max to avoid cspell 'Unknown word' errors in CI Analyze step. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

fix: add cgroup terminology to cspell dictionary

4a81202

Add 'cgroupv' and 'usec' to the allowed words list to fix CI spell-check failures from the cgroup CPU metric addition. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

tvaron3 marked this pull request as ready for review April 30, 2026 17:08

Copilot AI review requested due to automatic review settings April 30, 2026 17:08

tvaron3 requested review from a team, LarryOsterman, RickWinter, heaths and ronniegeraghty as code owners April 30, 2026 17:08

Copilot started reviewing on behalf of tvaron3 April 30, 2026 17:09 View session

fix: move cspell words to sdk/cosmos/.cspell.json

aaef33d

Move 'cgroupv' and 'usec' from .vscode/cspell.json to the local sdk/cosmos/.cspell.json ignoreWords list. Reverts the root config. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot AI reviewed Apr 30, 2026

View reviewed changes

tvaron3 and others added 3 commits April 30, 2026 10:15

revert: restore .vscode/cspell.json to base branch state

3415108

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

style: minimize cspell.json diff to only new words

bcfab2e

Restore original 2-space indent and sort order, keeping diff to just the 3 added words (cgroupv, cgroupv2, usec). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

FabianMeiswinkel approved these changes Apr 30, 2026

View reviewed changes

github-project-automation Bot moved this from Todo to Approved in CosmosDB Go/Rust Crew Apr 30, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(cosmos-perf): record server-reported request duration as backend latency#4316

feat(cosmos-perf): record server-reported request duration as backend latency#4316
tvaron3 wants to merge 9 commits intoAzure:release/azure_data_cosmos-previewsfrom
tvaron3:feat/perf-backend-latency-v2

tvaron3 commented Apr 29, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

FabianMeiswinkel left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

tvaron3 commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

1. Backend (Server-Reported) Latency

2. Cgroup CPU Utilization (cgroup_cpu_percent)

Changes

Testing

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

FabianMeiswinkel left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

tvaron3 commented Apr 29, 2026 •

edited

Loading

2. Cgroup CPU Utilization (`cgroup_cpu_percent`)